-
Notifications
You must be signed in to change notification settings - Fork 2k
Use mincore(2) to create diff snapshots without dirty page tracking #5274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use mincore(2) to create diff snapshots without dirty page tracking #5274
Conversation
0622764
to
7379503
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5274 +/- ##
==========================================
+ Coverage 82.92% 83.00% +0.08%
==========================================
Files 250 250
Lines 26844 26879 +35
==========================================
+ Hits 22260 22312 +52
+ Misses 4584 4567 -17
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
65690d4
to
9429bc6
Compare
20e2ca8
to
2fe1ba9
Compare
c5a6f74
to
fbb2e4d
Compare
f20dff0
to
31e6169
Compare
31e6169
to
5199aa0
Compare
f61fb5b
to
658de5a
Compare
Include a sentence about diff snapshots producing sparse files in our snapshot documentation. Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
The rebase-snap tool is deprecated and snapshot-editor should be used. Since the two ways of merging snapshot files are completely equivalent, let's remove the rebase-snap instructions and only keep the snapshot-editor ones, to avoid people using deprecated tools. Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Currently, Firecracker only supports creation of diff snapshots if dirty page tracking is explicitly enabled. Allow creation of diff snapshots even if it is not enabled, through the use of mincore(2). The mincore(2) syscalls determines which pages of a VMA are "in core". For anonymous mappings (as used by booted VMs without vhost-user devices), this refers to all pages that are currently faulted in. For memfd (as used by booted vms with vhost-user devices), this means all pages that have been allocated into the memfd, regardless of whether they were allocated through the VMA on which mincore(2) was called (meaning creation of mincore-diff-snapshots will correctly account for pages that were only touched by the vhost-user backend, but not by Firecracker or KVM). For restored VMs, this means all pages of the underlying snapshot file that have been faulted in. Note that this only works if swap has been disabled, as pages currently swapped to disk do not count as "in-core", yet obviously should be included in a diff snapshot. If swap is used, dirty page tracking MUST be enabled for diff snapshots to work correctly. Compared to diff snapshots based on dirty page tracking, mincore-based diff snapshots will be slightly larger. This is because dirty page tracked diff snapshots only include pages that were actually written to, while mincore-based snapshots will contain all pages that were accessed at all, e.g. even if only for reading. Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
We have two different API parameters that do the same thing in two different APIs: For booted VMs, we enable KVM dirty page tracking by setting track_dirty_pages to true in /machine-config. For resumed VMs, we enable KVM dirty page tracking by setting enable_diff_snapshots to true in /snapshot/load. Apart from being inconsistent for no reason, one of these is very badly named: With support for diff snapshots without dirty page tracking, enable_diff_snapshots is a misnomer, because setting it to false will still allow us to do diff snapshots, it'll just fall back to mincore. Rename enable_diff_snapshots to track_dirty_pages, so we're consistent and also so that the parameter reflects what is actually happening. Go through the whole deprecation cycle of deprecating enable_diff_snapshots and adding the new track_dirty_pages parameter at the same time. We will need to remove enable_diff_snapshots in 2.0 Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Document that diff snapshots can work even without dirty page tracking, and what the drawbacks of this are (bigger snapshots, no swap support). Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
We were using this property as a proxy for two things: "Does this snapshot rebase rebasing before resumptions?" and "do I need to enable dirty page tracking to create another snapshot of this type?". With mincore-snapshots, these two questions are no longer equivalent (mincore snapshots need rebase, but do _not_ need dirty page tracking), so untangle this property into two properties. Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
With support for "mincore diff snapshots", we'll have two snapshot types that map to "diff" in the firecracker API, so the enum can no longer be string based. Instead just have a property that translates to Firecracker API types. Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Just always demand the enum. The internal conversion from str to enum now doesnt work anymore with the enum no longer being str-based. Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Have them use the correct needs_rebase/needs_dirty_tracking helpers instead. Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Now that the test framework correctly differentiates between the need for rebase and the need for dirty page tracking, start running tests with mincore snapshots. Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Just have a single test that is parametrized by snapshot type. This will then automatically also test mincore diff snapshots. Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Fix various minor errors: - Drop some specifics on the cgroups v1 disclaimer, because all supported host kernel versions are "5.4+" - Do not claim that creating a snapshot has no effect on the running VM, because that's not true. - Cut down on some repeated and confusing information / examples near the end. Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Extend the limitation section of the huge pages docs to include a note about how dirty page tracking negates the benefits of huge pages. Also add a cross-reference to the diff snapshot docs. Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
At the time of writing the docs, that was temporarily the case in v1.6.0, but was reverted in scenarios where no vhost-user devices are attached due to performance regressions. Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
658de5a
to
4cdcc76
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Currently, Firecracker only supports creation of diff snapshots if dirty
page tracking is explicitly enabled. Allow creation of diff snapshots
even if it is not enabled, through the use of mincore(2). The mincore(2)
syscalls determines which pages of a VMA are "in core". For anonymous
mappings (as used by booted VMs without vhost-user devices), this refers
to all pages that are currently faulted in. For memfd (as used by booted
vms with vhost-user devices), this means all
pages that have been allocated into the memfd, regardless of whether
they were allocated through the VMA on which mincore(2) was called
(meaning creation of mincore-diff-snapshots will correctly account for
pages that were only touched by the vhost-user backend, but not by
Firecracker or KVM). For restored VMs, this means all pages of the
underlying snapshot file that have been faulted in.
Note that this only works if swap has been disabled, as pages currently
swapped to disk do not count as "in-core", yet obviously should be
included in a diff snapshot. If swap is used, dirty page tracking MUST
be enabled for diff snapshots to work correctly.
Compared to diff snapshots based on dirty page tracking, mincore-based
diff snapshots will be slightly larger. This is because dirty page
tracked diff snapshots only include pages that were actually written to,
while mincore-based snapshots will contain all pages that were accessed
at all, e.g. even if only for reading.
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
tools/devtool checkstyle
to verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md
.Runbook for Firecracker API changes.
integration tests.
TODO
.rust-vmm
.